AITopics

Country: Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Neural Information Processing SystemsFeb-7-2026, 23:54:53 GMT

309fee4e541e51de2e41f21bebb342aa-Supplemental.pdf

coefficient, commitment loss coefficient, loss coefficient, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsOct-10-2025, 21:35:49 GMT

f50cebc22663df45ce619645bfabb3b3-Supplemental-Datasets_and_Benchmarks_Track.pdf

adaptation, domain adaptation, validation accuracy, (10 more...)

Country: Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 20:43:14 GMT

A Additional prompt data details

Desination will be a red barn on the right 1. Continued on next page 18 Use Case Example rewrite Rewrite the following text to be more light-hearted: -- {very formal text} -- chat The following is a conversation with an AI assistant.

completion, large language model, machine learning, (23 more...)

Country:

Europe > Greece (0.04)
Asia > Southeast Asia (0.04)
Oceania > New Zealand (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre:

Questionnaire & Opinion Survey (0.93)
Personal > Obituary (0.45)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Economy (1.00)
(3 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Human Computer Interaction (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
(3 more...)

Neural Information Processing SystemsAug-14-2025, 19:01:56 GMT

4f5aeaee95e528a0ec5040bfa2fe9303-Supplemental-Conference.pdf

atari game, expansion, virtual expansion, (12 more...)

Industry: Leisure & Entertainment > Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-2-2024

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Wei, Tianwen, Zhu, Bo, Zhao, Liang, Cheng, Cheng, Li, Biye, Lü, Weiwei, Cheng, Peng, Zhang, Jianhao, Zhang, Xiaoyu, Zeng, Liang, Wang, Xiaokun, Ma, Yutuan, Hu, Rui, Yan, Shuicheng, Fang, Han, Zhou, Yahui

In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.

dense model, loss coefficient, skywork-moe, (14 more...)

2406.06563

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Cekinel, Recep Firat, Karagoz, Pinar

Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach

arXiv.org Artificial IntelligenceFeb-9-2024

The rapid dissemination of misinformation through social media increased the importance of automated fact-checking. Furthermore, studies on what deep neural models pay attention to when making predictions have increased in recent years. While significant progress has been made in this field, it has not yet reached a level of reasoning comparable to human reasoning. To address these gaps, we propose a multi-task explainable neural model for misinformation detection. Specifically, this work formulates an explanation generation process of the model's veracity prediction as a text summarization problem. Additionally, the performance of the proposed model is discussed on publicly available datasets and the findings are evaluated with related studies.

coeff, kotonya and toni, proceedings, (14 more...)

2402.06443

Country:

Europe > United Kingdom (0.14)
North America > United States > Montana (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Media > News (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Lakkapragada, Anish, Sleiman, Essam, Surabhi, Saimourya, Wall, Dennis P.

Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies

arXiv.org Artificial IntelligenceNov-22-2022

Multi-Task Learning (MTL) is a growing subject of interest in deep learning, due to its ability to train models more efficiently on multiple tasks compared to using a group of conventional single-task models. However, MTL can be impractical as certain tasks can dominate training and hurt performance in others, thus making some tasks perform better in a single-task model compared to a multi-task one. Such problems are broadly classified as negative transfer, and many prior approaches in the literature have been made to mitigate these issues. One such current approach to alleviate negative transfer is to weight each of the losses so that they are on the same scale. Whereas current loss balancing approaches rely on either optimization or complex numerical analysis, none directly scale the losses based on their observed magnitudes. We propose multiple techniques for loss balancing based on scaling by the exponential moving average and benchmark them against current best-performing methods on three established datasets. On these datasets, they achieve comparable, if not higher, performance compared to current best-performing methods.

artificial intelligence, dataset, machine learning, (15 more...)

2211.12999

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJun-25-2021

Building Intelligent Autonomous Navigation Agents

Chaplot, Devendra Singh

Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.

explorable area prediction function, ieee rsj international conference, information processing system, (16 more...)

2106.13415

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.92)
Instructional Material (0.92)
Workflow (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Educational Setting > Online (0.67)
Health & Medicine > Therapeutic Area (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

#artificialintelligenceFeb-25-2018, 03:02:59 GMT

Miej/Dynamic_Neural_Manifold

In this project, I've built a neural network architecture with a static execution graph that acts as a dynamic neural network in which connections between various neurons are controlled by the network itself. This is accomplished by manipulating the adjacency matrix representation of the network on a per-neuron basis with cell elements representing a'distance', and masking off connections that are within a threshold. Including a loss term based on the networks sparsity or processing time allows the architecture to optimize its structure for accuracy or speed. Alright, so hopefully I've caught your attention with the title. To begin, I'd like to explain a little behind why I've created this. My educational background is actually in the sciences, just at the junction between chemistry and physics.

artificial intelligence, machine learning, neuron, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)